XML Format Guidelines for the TUNA Corpus

نویسندگان

  • Albert Gatt
  • Kees van Deemter
چکیده

This document forms part of the 2008 distribution of the TUNA Corpus, Version 1.0. This is the first public release of the complete TUNA Corpus of Referring Expressions. A subset of the corpus was used in the first Shared Task and Evaluation Challenge for NLG, the Attribute Selection for the Generation of Referring Expressions Challenge (ASGRE), co-located with the Workshop on Using Corpora in NLG. A subset is also being used for the second edition of the Challenge (the REG Challenge 2008), to be held in Ohio in June 2008, co-located with the International Conference on NLG. Both of these previous releases consist exclusively of the singular referring expressions in the TUNA corpus; moreover, the annotation for both ASGRE 2007 and REG 2008 has a different format which was specifically designed for the tasks involved. This release contains the final version of the TUNA annotation, and includes the full corpus, that is, both singular and plural descriptions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing the Latvian Speech Recognition Corpus

In this paper the authors present the first Latvian speech corpus designed specifically for speech recognition purposes. The paper outlines the decisions made in the corpus designing process through analysis of related work on speech corpora creation for different languages. The authors provide also guidelines that were used for the creation of the Latvian speech recognition corpus. The corpus ...

متن کامل

Cost-based attribute selection for GRE

In this paper we discuss several approaches to the problem of content determination for the generation of referring expressions (GRE) using the Graphbased framework of Krahmer et al. (2003). This work was carried out in the context of the First NLG Shared Task and Evaluation Challenge on Attribute Selection for Referring Expression Generation. In the shared task proper of the Challenge the outp...

متن کامل

Cost-based attribute selection for GRE (GRAPH-SC/GRAPH-FP)

In this paper we discuss several approaches to the problem of content determination for the generation of referring expressions (GRE) using the Graphbased framework of Krahmer et al. (2003). This work was carried out in the context of the First NLG Shared Task and Evaluation Challenge on Attribute Selection for Referring Expression Generation. In the shared task proper of the Challenge the outp...

متن کامل

Processing XML Text with Python and ElementTree a Practical Experience

In this paper, we evaluate the use of XML format as an internal format for storing texts in linguistic corpora, and describe our experience in using the ElementTree Python XML parser in the Slovak National Corpus.

متن کامل

Constraints for corpora development and validation

In this paper we consider corpora as a set of XML documents. The guidelines for the creation of the corpora determine the semantics of the data, stored in them. Usually the guidelines prescribe the actual structure of the corpora, the used symbols, their meaning and the relations among them. Ideally, the software supporting the creation of a corpus has to allow all the constraints that follow f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008